Thank you for downloading.
This is the patent novelty data of
Martin Watzinger, and Schnitzer, Monika. Standing on the shoulders of science. No. 13766. CEPR Discussion Papers, 2019.
in CSV (comma-seperated) and stata 14 format.

For data construction please consult the appendix of this paper. 

The main variable of interest is "novelty" and "novelty_res". 
- novelty is our measure for patent novelty based on keyword combinations.
- novelty_r is novelty residualized for the number of words (used to calculate the measure) and filing year of patent. The measure is also discretized and windsorized at the 2.5th and 97.5th percentile. 
=> This is the measure you most likely want to use. Accounting for the number of words removes systematic biases due to different text length and the filing year removes the overall trend. Windsorizing removes a couple of outlier. Zero is the novelty of an average patent in a year independent of technology.
- novelty_res is additionally residualized for tech x filing year and windsorized at the 2.5th and 97.5th percentile. Here zero is a patent at with an average novelty in a tech x year combination.
You can merge arbitrary other data by using the publication number (publn_nr). These are US patents from 1980 to 2012.

Please also check the (hopefully descriptive) variable labels in the Stata format.

If you use the data please cite 
Martin Watzinger, and Schnitzer, Monika. Standing on the shoulders of science. No. 13766. CEPR Discussion Papers, 2019.

If you have any questions or suggestions please email
Martin.Watzinger@econ.lmu.de

Thanks!


Variable list
publn_auth - Publication authority
publn_nr - Patent number
novelty - Patent novelty
novelty_r - Patent novelty adjusted for filing year and word number
novelty_res - Patent novelty additionally adjusted for IPC technology class x year
mean_word_age - Average word age
min_word_age - Age of youngest word in patent.
novelty_p10 novelty_p50 novelty_p90 novelty_sd: Different percentiles and standard deviation within word novelty
number_words - number of words used to calculate the measure.